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Abstract 

Discretizing  the  first-kind  integral  equations  which  model  many  physi- 
cal measurement  processes  yields  an  ill-conditioned  linear  regression  model 
b = Ax*  + 77,  where  x*  is  a vector  representation  of  the  function  being  mea- 
sured, A is  an  instrument  response  matrix,  b is  a vector  of  measurements,  and 
77  is  a vector  of  unknown,  random  measuring  errors.  Least  squares  estimation 
usually  gives  a sum  of  squared  residuals  much  smaller  than  the  expected  value 
and  a wildly  oscillating,  physically  implausible  estimate  of  x*.  These  symp- 
toms suggest  that  the  least  squares  estimate  captures  part  of  the  variance  that 
properly  belongs  in  the  residuals.  One  strategy  for  shifting  some  of  this  vari- 
ance to  the  residuals  and  simultaneously  stabilizing  the  estimate  is  to  truncate 
the  singular  value  decomposition  A = UEV7  where  U and  V are  orthogonal 
matrices  and  £ is  a diagonal  matrix  of  singular  values.  All  of  the  singular  val- 
ues below  some  threshold  value  are  reset  to  zero  to  give  a new  matrix  £fr,  and 
the  estimated  solution  is  calculated  from  the  generalized  inverse  of  the  matrix 
U£/rVT.  The  most  delicate  part  of  this  procedure  is  the  determination  of  the 
truncation  threshold.  Conventionally  this  has  been  regarded  as  a problem  of 
determining  the  “numerical  rank”  of  A,  but  in  most  cases  A is  clearly  not 
rank-deficient.  This  paper  suggests  an  alternate  strategy  which  uses  the  vari- 
ances of  the  measuring  errors  to  specify  a truncation  for  the  elements  of  the 
rotated  measurement  vector  UTb.  The  idea  is  to  zero  all  of  the  components 
that  are  dominated  by  the  measurement  errors  and  compute  the  estimate  using 
the  full  rank  matrix.  The  problem  of  setting  the  truncation  threshold  becomes 
one  of  deciding  whether  or  not  a measured  value  is  significantly  different  from 
zero,  a procedure  familiar  to  most  experimentalists.  The  paper  also  develops 
some  new  diagnostics  for  the  residuals  which  are  useful  not  only  for  choosing 
the  truncation  level  for  the  (U7b);,  but  also  for  assessing  the  quality  of  an 
estimate  obtained  by  any  procedure. 


1 First  Kind  Integral  Equations  with  Uncer- 
tainties 

First  kind  integral  equations, 

y(t)  = [ K(t,€)x(Z)d£  , (1.1) 

J a 

where  K(t,£)  and  y(t)  are  known  functions,  are  routinely  used  to  model  in- 
strument distortions  in  measuring  an  unknown  function  x(^).  In  that  context, 
they  are  usually  written  as  a system  of  equations 

rb 

Vi  = / Ki(g)x($)  d£  + ei,  i = l,2,  (1.2) 

where  the  A'*(£)  are  known  (previously  measured  or  calculated)  response  func- 
tions of  the  instrument,  the  yl  = y(t{ ) are  measured  values,  corresponding  to 
a discrete  mesh  t\,  • • • , tm > and  the  e*  are  random,  zero-mean  measuring  er- 
rors. In  order  to  estimate  x(£)  it  is  necessary  to  further  discretize  the  system, 
in  the  process  replacing  it  with  a linear  regression  model 

y = Kx*  + e , (1.3) 

where  y is  the  ra-vector  of  measurements,  K is  a known  m x n matrix,  with 
m > n,  and  x*  is  an  unknown  n-vector  whose  components  are  either  discrete 
point  estimates  of  x(£)  on  some  mesh  £i,  £2)  • • • , or  are  the  unknown  coeffi- 
cients in  a truncated  expansion  of  x(^)  in  terms  of  some  set  of  basis  functions 
spanning  the  space  of  possible  solutions.  The  vector  e is  an  m- vector  of  random 
measuring  errors  satisfying 

£(e)  = 0,  f(eeT)=S2,  (1.4) 

where  S is  the  expectation  operator,  0 is  the  m-dimensional  zero  vector  and 
S2  is  the  positive  definite  variance-covariance  matrix  for  e.  In  most  problems 
the  measurement  errors  are  statistically  independent  so 

S2  = diag(s?,  si,  . . . , s2m)  , (1.5) 

where  Si,S2,...,sm  are  the  standard  deviations  of  the  errors.  If  SJ  is  not 
diagonal,  the  model  can  be  transformed  into  one  with  a diagonal  variance 
matrix  by  premultiplying  (1.3)  by  the  inverse  of  the  lower  triangular  Cholesky 
factor  of  S2. 
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There  seems  to  be  a general  misapprehension  in  the  numerical  analysis 
community  that  estimates  of  the  S{  are  seldom  available,  but  in  fact  good 
experimenters  routinely  provide  them.  Estimates  of  the  measurement  errors 
are  not  considered  to  be  something  extra,  but  rather  are  an  integral  part  of 
the  measurements,  and  published  graphs  of  measured  data  will  usually  report 
them  as  ±1<7  error  bars  on  the  plotted  points.  An  analyst  who  fails  to  use 
this  information  implicitly  assumes  that  S2  = s2lm  where  Im  is  the  m-th  order 
identity  matrix  and  s is  an  unknown  scalar  that  can  be,  but  usually  is  not,  es- 
timated from  the  sum  of  squared  residuals  for  the  least  squares  solution.  Such 
an  assumption  may  be  tenable  if  all  the  yi  have  roughly  the  same  magnitude, 
but  in  most  cases,  they  span  a range  of  values,  and  the  magnitudes  of  the  s* 
vary  (usually  nonlinearly)  with  the  magnitudes  of  the  yt. 

In  the  following  it  will  be  assumed  that  S is  a known  diagonal  matrix.  It 
will  also  be  assumed,  as  is  often  the  case,  that  the  errors  are  samples  from 
a multivariate  normal  distribution,  i.e.,  that  e ~ N(  0 , S2  ).  In  §2,  these 
assumptions  will  be  used  to  rescale  the  problem  to  have  errors  with  a stan- 
dard normal  distribution  and  to  derive  a statistical  diagnostic  for  the  sum  of 
squared  residuals  for  any  estimate  of  the  solution  vector.  In  §3  a variant  of  the 
well  known  Phillips  problem  [16]  is  subjected  to  this  scaling  and  the  ordinary 
least  squares  estimate  is  calculated  and  found  to  be  unsatisfactory.  In  §4  the 
residuals  for  that  estimate  are  subjected  to  diagnostic  tests  based  on  the  ex- 
pected value  of  their  sum  of  squares  and  on  their  distribution  when  considered 
as  a time-series.  The  diagnostics  that  are  developed  there  are  useful  for  testing 
any  estimate,  no  matter  how  it  may  be  obtained,  and  will  be  used  throughout 
the  remainder  of  the  paper. 

The  singular  value  decomposition  is  introduced  in  §5  and  the  conventional 
method  for  stabilizing  solution  estimates  by  truncating  the  distribution  of  sin- 
gular values  is  described  in  §6.  This  method  is  based  on  an  assumption  that 
the  matrix  is  rank  deficient  and  depends  critically  on  the  proper  determina- 
tion of  its  “numerical  rank.”  But  for  most  ill-posed  problems,  the  matrix  is 
demonstrably  not  rank  deficient  so  there  is  no  good  reason  for  setting  any  of 
the  singular  values  to  zero.  An  alternate  approach  to  truncating  the  decom- 
position is  given  in  §7.  Rather  than  zeroing  some  of  the  singular  values,  one 
instead  zeroes  small  elements  of  the  rotated  right  hand  side  vector  formed  by 
premultiplying  the  vector  of  measurements  by  the  transpose  of  the  matrix  of 
left  singular  vectors,  i.e.,  the  leftmost  factor  in  the  singular  value  decomposi- 
tion. The  errors  in  this  rotated  measurement  vector  have  a standard  normal 
distribution,  so  it  is  possible  to  establish  a statistical  criterion  for  determin- 
ing whether  or  not  a given  element  is  significantly  different  from  zero.  The 
new  method  is  applied  to  the  modified  Phillips  problem  to  obtain  satisfactory 
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estimates  for  3 different  truncation  levels  and  the  optimum  level  is  taken  to 
be  the  one  which  best  satisfies  the  diagnostic  tests  described  in  §4.  In  §8  the 
method  is  successfully  applied  to  real-world  measurements  of  the  energy  spec- 
trum of  neutrons  produced  by  a certain  nuclear  reaction.  Finally,  §9  gives  a 
brief  discussion  of  algorithmic  considerations  and  of  how  the  method  can  be 
extended  when  the  knowledge  of  the  measurement  errors  is  not  as  complete  as 
might  be  desired. 

2 Scaling  the  Problem 

The  linear  regression  model  in  the  preceding  section  can  be  written 

y = Kx*  + € , e ~ N(  0 , S2  ) , (2.1) 

but  it  is  advantageous  to  scale  it  with  the  matrix  S-1.  Let 

b = S-1y  , A = S-1K  , 77  ee  S^e,  (2.2) 

and  note  that  by  a standard  theorem  of  multivariate  statistics  [2,  Thm.  2.4.4], 
e ~ N(  0 , S2  ) implies  that  77  ~ N(  S-10  , S-1S2[S-1]T  ),  so  the  scaled 
model  can  be  written 

b = Ax*  + 77  , 77  ~ N{  0 , ITO  ) , (2.3) 

or 

b ~N(  Ax*  , Im  ) . (2.4) 

To  see  the  advantage  of  this  scaling,  let  x be  an  estimate  of  x*  and 

r = b — Ax  , (2.5) 

be  the  corresponding  residual  vector.  Since  the  regression  model  can  also  be 
written 

77  = b — Ax*  , (2.6) 

it  is  clear  that  an  estimate  x is  acceptable  only  if  r is  a plausible  sample  from 
the  rj- distribution. 

Since  b — Ax*  ~ N(  0 , Im  ),  it  follows  from  another  standard  statistical 
theorem  [14,  page  140]  that 

||b  - Ax*||2  = (b  — Ax*)T(b  — Ax*)  ~ X2(m ) > (2-7) 
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where  x2(m)  denotes  the  Chi-squared  distribution  with  m degrees  of  freedom, 
and  hence  that 


E { 1 1 b — Ax*||2}  = m , Var  {||b  — Ax*||2}  = 2m  . (2.8) 


These  two  quantities  provide  rough  bounds  for  the  sum  of  squared  residuals 
that  might  be  expected  from  a reasonable  estimate  of  x*.  An  estimate  that 
gives 


m — V2m  < ||b  — Ax||2  < m + V2m  (2.9) 


would  be  quite  reasonable,  but  any  x whose  sum  of  squared  residuals  falls 
outside  the  interval  m — 2\[2 m,  m + 2\j2m  would  be  suspect.  These  rough 
indicators  can  be  sharpened  considerably  by  using  the  cumulative  distribution 
function  for  x2{m)-  More  details  on  this  point  are  given  in  Appendix  A. 


3 Standard  Linear  Regression 

The  standard  approach  for  the  linear  regression  problem  defined  by  [2.3]  is  to 
assume  that  rank(A)  = n and  seek  the  minimum  variance,  unbiased  estimator 
for  x*  by  solving  the  least  squares  minimization  problem 

rmin=  min{(b- Ax)T(b- Ax)}  . (3.1) 

The  solution, 

x=(a7A ) Arb  , (3.2) 

is  called  the  best  linear  unbiased  estimate  of  x*,  but  it  is  well  known  [18, 
Chapt.  1],  [19,  Chapt.  6],  [21,  Chapt.  2]  that  for  regression  models  obtained 
by  discretizing  first  kind  integral  equations,  the  elements  of  x are  patholog- 
ically sensitive  to  small  variations  in  the  elements  of  b,  so  the  presence  of 
the  measuring  errors  leads  to  estimates  that  are  totally  unphysical,  typically 
oscillating  wildly  between  extreme  positive  and  negative  values. 

A useful  test  problem,  with  known  solution,  which  shares  many  of  the  char- 
acteristics of  real  instrument  correction  problems  can  be  obtained  by  discretiz- 
ing a variant  of  the  well  known  Phillips  equation  [16]  and  adding  some  random 
errors  to  the  discrete  yi.  The  problem  and  a discretization  with  m — 300  and 
n — 241  are  described  in  detail  in  Appendices  B and  C.  The  functions  y(t ) 
and  :c(£)  are  plotted  on  the  right  in  Figure  1,  and  on  the  left  are  plotted  the 
functions  K(t,^j)  for  the  discrete  values  = —3.0 , — 1.5  , 0 , 1.5  , 3.0 . All  of 
the  241  K(t,  £fc)  have  the  same  shape  and  subtend  unit  area.  The  300  standard 
deviations  Sj  range  in  value  from  3.49  x 10~n  to  7.78  x 10-6  so  the  errors  in 
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Figure  1:  A variant  of  the  Phillips  problem 

the  y%  are  much  smaller  than  the  thickness  of  the  curve  in  the  plot  of  y{t). 
But  these  small  errors  produce  large  oscillations  in  the  least  squares  estimate 
of  the  solution  vector.  This  is  shown  in  the  plot  in  the  upper  left  corner  of 
Figure  2 where  the  barely  discernable  dashed  curve  is  the  true  solution  x*,  and 
the  solid  curve  oscillating  wildly  about  it  is  the  least  squares  estimate  x.  The 
magnitudes  of  these  oscillation  are  roughly  10'  times  greater  than  the  largest 
random  errors  in  the  yi.  Such  a large  amplification  of  the  measuring  errors  is 
the  rule  rather  than  the  exception  for  ill  posed  problems. 

4 Estimate  Diagnostics 

The  most  commonly  used  estimate  diagnostic  for  linear  regression  problems 
is  the  sum  of  squared  residuals.  For  the  problem  in  the  preceding  section  the 
least  squares  estimate  gives 

rmin  = min  ||b  - Ax||2  = ||b  - Ax||2  = 43.01  (4.1) 

which  is  much  smaller  than  the  value  expected  for  the  true  solution  which,  by 
Eq.  (2.8),  is 

£{||b-  Ax*||2}  = 300.  (4.2) 
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Figure  2:  Least  squares  estimate  and  diagnostics  for  the  modified  Phillips 
problem 


From  Eq.  (2.8)  it  also  follows  that  the  standard  deviation  for  this  expected 
value  is  \/600  = 24.49,  so  the  least  squares  r^in  is  more  than  10  standard 
deviations  smaller  than  the  expected  value.  Thus,  without  even  inspecting  the 
estimate  itself,  one  could  argue  that  x is  unlikely  to  be  a good  approximation 
to  x*. 

Another  good  diagnostic  forjudging  the  acceptability  of  an  estimate  can  be 
gotten  by  considering  the  elements  of  the  vectors  77  and  r as  time  series,  with 
the  element  number  taken  as  the  time  variable.  Since  the  rji  are  statistically 
independent  and  identically  normally  distributed,  they  form  a normally  dis- 
tributed white  noise  series,  so  the  residuals  for  an  acceptable  estimate  should 
constitute  a realization  of  such  a series.  The  residuals  for  the  least  squares 
estimate  are  plotted  against  element  number  at  the  upper  right  of  Figure  2.  It 
is  clear  by  inspection  that  they  are  not  even  approximately  white  noise,  and 
this  is  not  surprising  since  it  can  be  shown  [10,  Section  6.9.1]  that 


r ~ N 


(ata) 


-1 


(4.3) 
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with  the  variance  matrix  becoming  approximately  diagonal  only  when  m is 
much  larger  than  n. 

A rigorous  test  for  white  noise  is  based  on  the  periodogram  [6,  Chapt.  7] 
which  is  an  estimate  of  the  power  spectrum  on  the  frequency  interval  0 < 
/ < 2^,  where  T is  the  sample  spacing  for  the  time  variable.  Here,  the  time 
variable  is  the  element  number  z,  so  T = 1.  The  graph  at  the  lower  left  of 
Figure  2 is  a plot  of  a periodogram  estimate  at  4097  equally  spaced  frequency 
points  on  the  interval  [0,0.5].  The  details  on  how  it  was  computed  are  given 
in  Appendix  D.  For  a white  noise  series,  the  variance  would  be  distributed 
uniformly  over  the  frequency  range,  so  the  fact  that  there  is  much  more  power 
on  the  interval  [0, 0.25]  than  on  [0.25, 0.5]  suggests  that  the  residuals  should 
not  be  regarded  as  white  noise.  A formal  test  of  this  hypothesis  uses  the 
cumulative  periodogram  [6,  Chapt.  7]  (also  described  in  Appendix  D)  which 
is  plotted  as  a solid  curve  at  the  lower  right  of  the  figure.  The  legend  box 
gives  the  length  of  this  curve  which  can  be  compared  with  the  value  1.11803 
expected  for  a pure  white  noise  spectrum.  The  two  dashed  lines  enclose  a 95% 
confidence  band  for  white  noise.  The  cumulative  periodogram  ordinates  for 
such  a series  should  lie  outside  this  band  for  at  most  5%  of  the  frequencies. 
Since  2803  of  the  4096  calculated  ordinates  lie  outside  the  band,  it  is  safe  to 
conclude  that  the  least  squares  residuals  are  not  white  noise. 

5 The  Singular  Value  Decomposition 

Insight  into  the  failure  of  the  least  squares  method  to  give  a reasonable  estimate 
of  x*  can  be  gotten  by  substituting  the  singular  value  decompostion  of  A into 
the  model  (2.3).  Let 

A = U ^ q j VT  , S = diag(a1,<72, . . • ,an)  , (5.1) 

where  o\  > 02  > • • • > <rn,  and 

UTU  = Im  = UUT  , VrV  = I„  = VVr  . (5.2) 

Substituting  (5.1)  into  (2.3)  and  premultiplying  by  UT  gives 

U^b  = ( o ) Vrx*  + UTr!  , UT»j  ~ JV(  0 . Im  ) , (5.3) 

with  the  distribution  of  the  U7  77  vectors  unchanged  because  premultiplication 
by  an  orthogonal  matrix  simply  rotates  all  the  vectors  in  the  distribution 


through  the  same  angle.  Also,  premultiplying  a vector  by  an  orthogonal  matrix 
leaves  its  length  unchanged  so  Eq.  (2.7)  can  be  replaced  by 


lb- Ax*l|2  = 


UTb  — 


O 


VTx* 


X2(rn)  , 


(5.4) 


with  the  x2{m)  distribution  following  from  the  fact  that  the  (Ur7/)j  ~ n(0, 1), 
idependently.  Partitioning  the  m x m matrix  U, 


with  Ui  an 
problem  to 


from  which 


U = (Ui,U2), 


(5.5) 


m x n and  U2  an  m x (m  — n)  submatrix  allows  the  least  squares 
be  written 


min  ||b  — Axil2 


x<Eftn 


mm 

xefln 


Uf 

Uj 


b- 


S 

o 


VTx 


(5.6) 


it  is  quite  evident  that  the  least  squares  solution  must  satisfy 


VTx  = S'"1Ufb  , 

and  that  the  minimum  sum  of  squared  residuals  is 


r2  ■ 

min 


Tii|2 


|b  — Ax||2  = HU2  b 


These  last  two  equations  can  also  be  written 

(VH  = 


and 


r2  • 

min 


z=n+ 1 


and  it  is  easy  to  see  that  Eq.  (4.3)  can  be  written 

r~/V(0,  Im-U,U[)  . 


(5.7) 


(5.8) 


* , i - 1,2,  ••  -,n  , 

T 

(5.9) 

m 0 

Axil2  = Z (U*b)  , 

(5.10) 

(5.11) 


For  the  modified  Phillips  problem  the  value  of  r^in  was  too  small  by  10  stan- 
dard deviations  which  suggests  that  some  of  the  (UTb)i  values  in  the  sequence 
(5.9)  more  properly  belong  in  the  sum  in  (5.10).  This  reasoning  leads  to  the 
idea  of  truncating  the  singular  value  decomposition. 
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6 Truncating  the  Singular  Value  Distribution 

Historically,  the  idea  of  truncating  the  SVD  has  been  treated  as  a problem 
of  determining  the  “numerical  rank”  of  the  matrix  A.  All  of  the  computed 
singular  values  smaller  than  some  threshold  value  are  treated  as  zeroes  which 
were  corrupted  by  rounding  errors  into  small  non-zero  quantities.  Thus  if  ap 
is  the  smallest  singular  value  greater  than  the  truncation  threshold,  then  one 
replaces  the  matrix  S by  a truncated  matrix 

£ 'nr  = diag(cr1, . . . , crp,  0, . . . , 0)  (6.1) 

whose  generalized  inverse  is  given  by 

= diag  ,0,...,0j  . (6.2) 

V°i  °p  J 

The  value  p is  said  to  be  the  “numerical  rank”  of  A.  If  x is  the  estimate  of 
the  solution  to  the  truncated  problem,  then  it  is  required  to  satisfy 

VTx  = SjrU[b  (6.3) 

rather  than  Eq.  (5.7),  which  means  that  Eqs.  (5.9)  and  (5.10)  are  replaced  by 


0 , i = p + 1,  • • • , n , 


and 

m 0 

||b  - Ax||2  = (Urb).  , (6.5) 

i=p+ 1 

and  Eq.  (5.11)  is  replaced  by 

f ~ N [(lm  - UpUj)  Ax' , Im  - U„Uj]  , (6.6) 

where  Up  is  the  submatrix  formed  by  the  first  p columns  of  U.  Since  p < n, 
it  is  clear  that  the  variance  matrix  for  r is  more  nearly  diagonal  than  that 
for  r,  and  that  if  p m,  then  the  A will  be  approximately  independently 
distributed. 

The  truncated  singular  value  approach  was  first  suggested  by  Golub  and 
Kahan  [8]  who  noted  its  similarity  to  the  theoretical  treatment  given  by 
Smithies  [17,  Chapt.  8]  for  the  singular  functions  and  singular  values  of  first 
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kind  integral  equations.  One  of  the  first  to  use  the  method  was  Richard  Han- 
son [13]  who  suggested  that  the  truncation  threshold  should  be  chosen  to  be 
the  smallest  integer  p such  that 

m „ 

y]  (uTb)  < m . (6.7) 

i=p+ 1 

In  view  of  Eqs.  (2.7)  and  (2.8),  this  seems  a very  sensible  choice,  but  his  sug- 
gestion was  apparently  ignored  by  most  other  workers  in  the  held  who  were 
preoccupied  with  the  problem  of  rigorously  defining  the  notion  of  the  “nu- 
merical rank”  of  the  matrix.  An  especially  influential  paper  in  this  line  of 
research  was  a technical  report  [9]  by  Golub,  Klema  and  Stewart  who  used 
techniques  from  perturbation  analysis  to  define  a very  complicated  criterion 
for  rank  determination  which  was  norm-dependent  and  involved  a triplet  of 
numbers  (8,e,p)  with  8 and  e being  numbers  satisfying  ap  > 8 > e > ap+\. 
The  idea  was  to  find  a clear  gap  in  the  distribution  of  singular  values  and  zero 
all  those  on  the  low  side  of  the  gap.  The  trouble  with  this  approach  is  that 
matrices  arising  from  discretizing  first  kind  integral  equations  almost  never 
display  such  a gap.  This  fact  was  clearly  enunciated  by  P.C.  Hansen  [11]  who 
proposed  “...  a natural  division  of  ill-conditioned  matrices  into  two  classes: 
those  with  well-determined  numerical  rank  and  those  with  ill-determined  nu- 
merical rank.”  By  the  latter  he  meant  matrices  whose  singular  values  decayed 
smoothly  from  o\  to  an  with  no  obvious  gap  between  the  larger  and  smaller 
ones.  More  recently,  in  his  new  book  [12,  Chapt.  4],  he  has  written  “The  main 
feature  of  these  problems  is  that  all  the  singular  values  of  the  coefficient  ma- 
trix decay  gradually  to  zero,  with  no  gap  anywhere  in  the  spectrum.  ...  and 
therefore  the  concept  of  ‘numerical  rank’  is  not  useful  for  these  problems.” 
Nevertheless  he  devoted  large  segments  of  the  book  to  problems  of  rank  de- 
termination and  rank  revealing  decompositions,  and  gave  a good  review  of  the 
very  sizable  literature  on  these  subjects. 

The  singular  values  for  the  modified  Phillips  problem  are  plotted  as  dis- 
crete squares  in  Figure  3 which  also  shows  the  elements  of  the  rotated  right 
hand  side  vector  U7b  plotted  as  circles  connected  by  straight  line  segments. 
The  decomposition  was  computed  by  subroutine  SGESVD  from  the  LAPACK 
collection  [1],  using  double  precision  arithmetic  on  a Sparc-20  workstation. 
The  largest  and  smallest  singular  values  are 

a-! (A)  = 2.882  x 105  , a24i (A)  = 2.622  x 10“2  , (6.8) 

which  gives  cond(A)  = 1.099  x 10'.  The  relative  acccracy  of  the  arithmetic 
was  emach  = 2.22  x 10~16,  so  the  smallest  singular  value  is  many  orders  of 
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Singular  Values  and  Vector  |UTb 


Figure  3:  Singular  values  (squares)  and  first  n elements  of  |U7b|  (circles)  for 
the  modified  Phillips  problem 

magnitudes  larger  than  the  effective  zero  level.  Thus  there  is  no  reason  to 
assume  that  rank(A)  < n.  Yet  the  problem  obviously  needs  some  sort  of  trun- 
cation to  prevent  the  estimated  solution  from  capturing  variance  that  properly 
belongs  in  the  residuals.  Fortunately,  the  desired  result  can  be  obtained  by 
leaving  the  singular  values  unchanged  and  truncating  the  rotated  right  hand 
side  vector  UT  b. 

7 Truncating  the  Vector  Urb 

Even  a cursory  inspection  of  the  |U7b|j  plotted  in  Figure  3 reveals  a sharp 
dichotomy  in  the  distribution  at  i = 55.  Before  the  break,  the  upper  envelope 
of  the  distribution  is  decreasing  at  a faster  rate  than  that  of  the  singular  values. 
After  the  break,  the  distribution  is  quite  flat  with  64  of  the  186  values  above 
the  horizontal  line  |U7b|  = 1 and  122  values  below.  That  line  is,  by  Equations 
(5.3),  just  the  one  standard  deviation  level  for  the  random  errors  ( \Jrri)l . If  the 
final  186  elements  of  UTb  were  chosen  randomly  from  an  n(0, 1)  distribution, 
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Singular  Values  and  Vector  |UTb 


Figure  4:  Singular  values  (squares)  and  first  n elements  of  |Urb|  (circles)  for 
the  modified  Phillips  problem  with  no  random  errors 

then  one  would  expect  roughly  62  of  them  to  satisfy  |U7b|j  > 1 and  the  other 
124  to  satisfy  |UTb|j  < 1.  Since  that  is  almost  exactly  what  is  observed,  it  is 
quite  reasonable  to  conclude  that  the  flat  part  of  the  distribution  represents 
elements  of  U7  b that  are  dominated  by  the  measuring  errors.  This  conclusion 
is  reinforced  by  Figure  4 which  gives  plots  of  the  cq( A)  and  |UTb|;  for  the 
same  problem  with  no  random  errors  in  the  b vector.  The  singular  values  are 
identical  to  the  ones  in  Figure  3,  but  the  distribution  of  the  |UTb|;  satisfies  the 
discrete  Picard  condition  over  the  whole  range,  and  Eq.  (5.9)  yields  estimates 
of  the  elements  of  x*  that  are  correct  to  8 significant  digits. 

The  foregoing  observations  suggest  a more  logical  way  to  truncate  the 
singular  value  decomposition.  Rather  than  zeroing  some  of  the  singular  values 
one  should  instead  zero  those  elements  of  |UTb|  that  are  judged  to  be  mostly 
random  error.  The  idea  is  to  pick  a truncation  level  r and  require  the  estimated 
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solution  x to  satisfy 


Kb). 

o, 


^■yT’^\  _ J rr,-  > if  |U  b|;;  > T , 


* = 1,2,  ...,n  . 


(7.1) 


0 , otherwise  , 

The  sum  of  squared  residuals  is  then  given  by 

r2  = ||b  - Ax||2  = ]T  (UTb)'  , 

iei 

where  the  indexing  set  for  the  sum  is 
T = { z |UTb|j  < r , i — 1,  2, . . . , n } (J  { n + 1,  n + 2,  . . . , m } . 
If  Uj  is  the  zth  column  of  U,  then  Eq.  (7.1)  can  be  written 

(ufb) 


(7.2) 


(7.3) 


(Wx), 


if  |u7  b|  > r , 
otherwise  , 
and  if  an  m x n matrix  U i is  defined  by 


0 


z = l,2,...,n,  (7.4) 


Ui  = (ui,u2,  • • -,u„)  , where  u*  = 


u,  , if  | u / b | > t 


0 , otherwise  , 

then  the  estimated  solution  and  residual  vectors  can  be  written 
x — , and  f = (lm  - U,U[)  b . 


(7.5) 


(7.6) 


It  follows  from  a fundamental  theorem  in  multivariate  analysis  [2,  Thm.  2.4.5] 
that 


r ~ 


U,ty)Ax*,  Im-U,U[]  . 


(7.7) 


It  is  clear  that  as  r increases,  the  number  of  non-zero  columns  in  U decreases, 
so  the  variance  matrix  for  r more  nearly  approximates  a diagonal  matrix. 

The  success  of  the  proposed  method  depends  crucially  on  the  choice  of  the 
truncation  level  r.  A safe  and  effective  approach  is  to  try  several  values  of  r 
and  use  the  diagnostics  described  in  Section  4 and  Appendix  D to  make  the 
final  choice.  The  results  of  this  strategy  for  the  modified  Phillips  problem  are 
summarized  in  Table  1.  The  labels  for  the  first,  second  and  final  columns  are 
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Table  1:  Estimate  diagnostics  for  the  modified  Phillips  Problem 


T 

£(b  - A x)? 

%(Ck  G R95 ) 

Length{F} 

I A(T  \rms 

Figure 

0.0 

43.0 

31.6 

1.238 

14.937 

2 

3.0 

235.6 

79.5 

1.211 

5.6865 

5 

4.0 

267.7 

100.0 

1.205 

0.0106 

6 

4.7 

289.2 

100.0 

1.209 

0.0122 

5.0 

311.4 

100.0 

1.207 

0.0132 

9.0 

380.7 

100.0 

1.206 

0.0148 

self  explanatory.  The  third  column  contains  the  percentage  of  the  4096  cumu- 
lative periodogram  ordinates  that  lie  inside  the  95%  confidence  band  for  white 
noise.  The  fourth  column  contains  the  length  of  the  cumulative  periodogram 
plot  which  can  be  compared  with  the  length  1.11803  for  a pure  white  noise 
series.  The  fifth  column  gives  the  root  mean  square  average  magnitude  of  the 
errors  in  the  elements  of  the  estimated  solution  vector,  i.e., 


| Aar  | 


\ 


n 


£ 

3= 1 


X, 


X 


*12 


(7.8) 


It  can  be  included  only  because  the  true  solution  is  known.  The  first  row  in 
the  table  gives  the  results  for  the  least  squares  estimate  shown  in  Figure  2. 

A guideline  for  choosing  a lower  bound  for  the  value  of  r is  the  fact  that 
most  experimentalists  would  be  reluctant  to  claim  that  a measured  value  is 
significantly  different  from  zero  if  its  magnitude  does  not  exceed  3 standard 
deviations  for  the  error  in  the  measurement.  Since  each  | U 7 b | * is  scaled  in 
units  of  one  standard  deviation  of  its  own  random  error,  it  suffices  to  choose 
r = 3.0.  The  results  obtained  for  this  choice  are  given  in  Figure  5 and  row  2 
of  the  table.  The  sum  of  squared  residuals,  235.6,  is  less  than  the  1 percentile 
value  for  the  x2(300)  distribution  (see  column  6 of  Table  3,  in  Appendix  A), 
i.e., 


Pr  {||b  - Ax*||2  < 235.6}  < Pr  { ||b  - Ax*||2  < 245.3}  = 0.01.  (7.9) 

Furthermore,  only  79.5%  of  the  cumulative  periodogram  ordinates  lie  inside 
the  95%  band  for  white  noise.  The  plot  of  the  estimate  reveals  the  persistence 
of  strong  high  frequency  oscillations  which,  by  inspection  of  Figure  3,  can  be 
attributed  to  the  component  |UT6|2o5  = 3.547.  Repeatedly  measuring  a stan- 
dard normally  distributed  quantity  will  give  a measurement  with  magnitude 
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Trunc.  SVD , | U T b | j > 3 . 0 Z(b-Ax)|2  = 


Resid.  Per  i odogram  Cum.  Per  i odog  ram 


Figure  5:  Truncated  solution  and  diagnostic  plots  for  the  modified  Phillips 
problem  with  r = 3 

greater  than  3 roughly  once  in  every  385  measurements  so  the  occurrence  of 
one  such  value  in  the  final  186  |U76|j  is  not  an  unlikely  event.  Its  effect  can 
be  transferred  from  the  estimated  solution  to  the  residuals  by  increasing  the 
truncation  threshold  to  r = 4.  The  results  for  that  value  are  given  in  Figure  6 
and  row  3 of  the  table.  The  sum  of  squared  residuals,  267.7,  differs  from  the 
expected  value  by  about  1.3  standard  deviations.  While  this  is  not  a small 
difference,  it  is  clear  from  column  6 of  Table  3 that 

0.05=  Pr  { ||b  — Ax*||2  < 260.6}  < Pr  { ||b  - Ax*||2  < 267.7}  , (7.10) 

so  267.7  must  be  regarded  as  an  acceptable  value.  This  conclusion  is  reinforced 
by  the  fact  that  none  of  the  cumulative  periodogram  ordinates  lie  outside  the 
95%  band.  The  plot  of  the  estimated  solution,  at  the  upper  left  of  Fig.  6,  is 
so  close  to  that  of  the  true  solution  that  the  two  are  almost  indistinguishable. 

Since  there  are  only  241  discrete  | U7  b | * , there  are  only  241  possible  trun- 
cated estimates  x.  Only  55  of  them  correspond  to  truncation  levels  greater 
than  r = 4.  The  smallest  four  of  those  are 

|UTb|56  = 4.63,  |UTb|5i  = 4.71,  |UTb|54  = 8.33,  |UTb|52  = 9.49.  (7.11) 
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Figure  6:  Truncated  solution  and  diagnostic  plots  for  the  modified  Phillips 
problem  with  r = 4 


These  values  define  3 ranges  for  r,  in  each  of  which  the  estimates  are  all 
identical.  For  the  lowest  2 ranges,  the  results  for  r = 4.7  and  r = 5.0  are  given 
in  the  fourth  and  fifth  rows  of  Table  1.  In  both  cases,  the  estimated  sum  of 
squared  residuals  lie  within  one  standard  deviation  of  the  expected  value,  and 
all  of  the  cumulative  periodogram  ordinates  lie  in  the  white  noise  band.  The 
highest  range  is  represented  by  r = 9 with  the  results  summarized  in  the  row 
6 of  the  table.  Although  all  of  the  periodogram  ordinates  lie  within  the  95% 
confidence  band  for  white  noise,  the  sum  of  squared  residuals,  380.7,  is  much 
too  large  to  be  acceptable  since,  from  column  6 of  Table  3, 

P r { 1 1 b — Ax*||2  > 380}  < Pr  {||b  - Ax*||2  > 359.1} 

= 1 - Pr {||b  — Ax*||2  < 359.1}  (7.12) 

= 0.01  . 

Thus,  the  only  estimates  to  give  acceptable  values  for  the  sum  of  squared 
residuals  are  the  ones  corresponding  to  truncation  levels  r = 4.0,  4.7,  and  5.0. 
Since  the  one  for  r = 4.0  gave  the  smallest  Length{C},  it  was  chosen  as  the 
most  acceptable,  and  this  choice  is  validated  by  the  fact  that  it  also  gave  the 
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smallest  |Ax|rms.  But  the  other  two  estimates  are  quite  good  also,  and  are 
qraphically  almost  indistinguishable  from  the  r = 4.0  estimate. 


8 A Real-World  Example 

The  measurement  of  nuclear  radiation  spectra  is  one  important  source  of  first 
kind  integral  equations  with  uncertainties.  Consider  the  energy  spectrum  of 
the  neutrons  produced  by  the  nuclear  reaction 

T ( d , n ) 4 He  , (8.1) 

i.e.  tritium  nuclei  bombarded  with  deuterons  to  produce  helium  nuclei  and 
the  neutrons  being  measured.  If  the  bombarding  particles  are  monoenergetic 
then  so  are  the  neutrons  produced,  with  energy  in  the  range  12  - 22  MeV 
(million  electron  volts),  depending  on  that  of  the  bombarding  particles.  Al- 
though the  neutrons  are  monoenergetic,  the  measuring  instrument  both  smears 
and  distorts  the  expected  spectrum.  The  instrument  in  question,  an  NE-213 
spectrometer,  has  been  described  by  Verbinski,  et.  al.  [20]  and  Burrus  and 
Verbinski  [5].  Its  response  functions  are  plotted  in  Figure  7,  where  incident 
energy  corresponds  to  the  variable  £ and  pulse  height  to  the  variable  t in  Eqs. 
(1.1).  The  incident  neutrons  are  absorbed  by  a plastic  scintillator  which  emits 
a light  pulse  whose  size  is  proportional  to  the  incident  energy.  The  light  pulse 
is  in  turn  detected  and  amplified  by  a photomultiplier  tube  which  transmits 
a corresponding  electrical  pulse  to  an  electronic  circuit  which  then  increments 
a counter  in  one  of  m — 133  pulse-height  bins  designed  to  cover  and  subdi- 
vide the  energy  range  of  all  possible  incident  neutrons.  The  integral  equations 
modelling  the  process  are 

(k  = [EuP  Kl(E)N(E)dE  + £i,  i = 1,2,  ...,113  , (8.2) 

where  c*  is  the  number  of  pulses  counted  in  the  zth  bin,  N(E)  is  the  unknown 
number  of  neutrons  at  energy  E,  and  the  KZ{E)  are  the  instrument  response 
functions  for  the  whole  detector  system.  For  a given  value  of  E,  the  quantity 
Ki(E)dE  is  the  probability  that  a neutron  with  energy  in  the  range  E ± \dE 
would  produce  a light  pulse  that  would  increment  the  count  in  the  fth  pulse- 
height,  bin.  Hopefully,  the  bin  number  in  which  a neutron  is  counted  would 
increase  as  the  energy  increases,  but  an  inspection  of  Figure  7 shows  that  this 
does  not  always  happen.  Ideally  the  A'Z-(E)  should  be  a series  of  narrow  peaks 
distributed  along  a diagonal  line  running  from  the  upper  left  to  the  lower  right 
of  the  energy,  pulse-height  domain.  The  figure  does  exhibit  a ridge  along  that 
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Figure  7:  Instrument  response  functions  for  the  Burrus  neutron  spectrum 
problem.  The  quantity  log10[l  + I\l(E)\  is  plotted  in  order  to  more  clearly 
show  the  structure  for  higher  energies. 

direction,  but  it  attenuates  to  a barely  discernible  ripple  for  higher  energies. 
Even  worse,  the  long  energy  tails  for  the  lower  pulse-height  bins  make  it  more 
likely  that  a higher  energy  neutron  will  produce  an  increment  in  a lower  rather 
than  a higher  bin. 

Figure  8 shows  a measured  spectrum  obtained  for  the  neutrons  from  the 
reaction  (8.1).  Note  that  count  rate  rather  than  counts  are  plotted  against 
pulse-height.  The  experimental  procedure  is  to  accumulate  counts  for  a time 
sufficient  to  allow  good  estimates  to  be  made  for  the  uncertainties  in  the  count 
totals  and  then  to  divide  by  the  elapsed  time  to  get  rates.  In  Eq.  (8.2)  this 
division  replaces  the  c*  with  count  rates  yi  and  the  number  N(E)  of  neutron 
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Observed  Pulse— Height  Spectrum 


Puls  e — H e i g h t 


Figure  8:  Measured  pulse-height  spectrum  for  the  Burrus  problem.  The  two 
piecewise  linear  curves  are  the  ± one  standard  deviation  bounds  for  the  mea- 
sured values 

with  a neutron  flux  x(E).  The  result  is 

f EUp 

Vi  = Ki(E)x(E)dE  + , i = 1,2,  ...,113  , (8.3) 

Jo 

where  the  Cj  are  noise  rates  obtained  from  the  Estimates  for  the  standard 
deviations  of  the  latter  quantities  were  obtained  from  the  usual  assumption 
about  counting  errors  which  is  that  the  standard  deviation  for  any  bin  is 
approximately  equal  to  the  square  root  of  the  number  of  counts  in  that  bin. 
Although  this  procedure  assumes  a Poisson  distribution  for  the  error  in  any 
bin,  the  number  of  counts  accumulated  was  sufficient  to  permit  the  use  of  the 
normal  approximation.  Assuming  that  the  measuring  errors  in  any  two  bins 
are  statistically  independent  and  dividing  the  estimated  standard  deviations 
by  the  time  of  accumulation  gives  a diagonal  variance  matrix  S2  for  the  errors. 

When  the  energy  E is  equated  to  the  variable  £,  the  equations  (8.3)  become 
identical  to  those  in  (1.2).  They  were  discretized  using  simple  rectangular 
quadrature  with  an  n = 77  point  energy  mesh.  0.2  MeV  — Ei  < E2  < ■ • • < 
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Singular  Values  and  Vector  |UTb 


Figure  9:  Singular  values  (squares)  and  first  n elements  of  |Urb|  (circles) 
for  the  Burrus  neutron  spectrum  problem.  The  horizontal  line  represents  the 
truncation  level  r = 3.5. 

E7 7 = 18.91  MeV,  to  give  the  linear  regression  model  (1.3).  Multiplying 
through  by  S-1  gave  the  normalized  model  (2.3).  Although  the  two  curves 
plotted  in  Figure  8 indicate  good  statistics  for  the  measurements,  there  is  no 
sign  of  the  dominant  peak  expected  for  monoenergetic  neutrons.  The  sharp 
rise  in  the  lower  pulse-height  bins  contains  most  of  the  counts  that  should  have 
gone  into  that  missing  peak.  The  least  squares  estimate,  which  is  overwhelmed 
by  the  amplified  noise,  gives 

r2mi„  = lib  - Ax||2  = 38.05  (8.4) 

which  is  almost  5 standard  deviations  smaller  than  the  expected  value  113. 
The  singular  values  and  first  n elements  of  |U7b|  are  shown  in  Figure  9. 
The  condition  number  of  the  matrix  is  2.3503  x 105  so  it  is  clearly  not  rank- 
deficient.  The  distribution  of  the  |UTb|j  exhibits  a dichotomy  at  i = 47  where 
it  first  drops  below  the  one  sigma  level  for  the  error.  Table  2 gives  estimate 
diagnostics  for  several  values  of  r.  The  quantities  tabulated  are  the  same  as 
those  in  the  first  4 columns  of  Table  1 All  of  the  non-zero  r give  estimates 
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Table  2:  Estimate  diagnostics  for  the  Burrus  neutron  spectrum  problem 


r 

E(b  - Ax)2 

%(Ck  € $95) 

Length  {£(/)} 

0.0 

38.0 

37.4 

1.255 

3.0 

93.8 

100.0 

1.211 

3.5 

125.7 

100.0 

1.212 

4.0 

150.7 

100.0 

1.211 

5.0 

213.9 

63.6 

1.216 

CLAS 1C,  trunc  = 3.5 
0 . 060 
0 . 050 
0 . 040 
0 . 030 
0 . 020 
0.010 
0 . 000 
-0.010 
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Figure  10:  Truncated  solution  and  diagnostic  plots  for  the  Burrus  neutron 
spectrum  problem  with  r = 3.5 
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with  a strong  peak  at  approximately  14  MeV,  but  only  r = 3.0,  3.5,  and  4.0 
give  residuals  consistent  with  white  noise.  The  sum  of  squared  residuals  for 
r = 3.5  is  closest  to  the  expected  value,  so  it  was  chosen  as  optimal  even  though 
the  length  of  the  cumulative  periodogram  was  slightly  larger  than  those  for 
r = 3.0  and  4.0.  The  estimate  and  diagnostic  plots  for  r = 3.5  are  given  in 
Figure  10.  The  estimate  is  dominated  by  a single  peak  centered  at  14  MeV 
which  is  just  what  was  expected  for  monoenergetic  neutrons. 


9 Discussion  and  Conclusions 

The  primary  message  of  this  paper  is  that  for  ill-posed  problems  in  which  the 
dominant  errors  are  the  measurement  uncertainties  for  the  right  hand  side 
vector,  the  instability  in  estimating  the  solution  is  attributable  to  components 
of  the  measurements  which  are  overwhelmed  by  the  random  error.  In  general, 
these  components  will  not  correspond  exactly  to  specific  elements  of  the  right 
hand  side  vector.  Scaling  the  problem  by  S-1  reduces  it  to  one  in  which  the 
errors  all  have  unit  variance,  and  premultiplying  the  scaled  measurement  vec- 
tor by  Ur  rotates  it  into  a basis  in  which  the  noise  dominated  components 
can  be  isolated  by  well  understood  statistical  criteria.  Zeroing  the  noise  dom- 
inated components  stabilizes  the  solution  estimate  without  altering  the  kernel 
matrix  which  is  assumed  to  be  known  more  accurately  than  the  right  hand 
side  vector.  Of  course  this  procedure  does  not  guarantee  that  the  estimated 
solution  will  be  close  to  the  true  solution  since  the  noise  may  be  large  enough 
to  overwhelm  important  components  of  the  signal.  The  only  way  to  retrieve 
such  components  would  be  to  repeat  the  measurments  with  more  accuracy. 

The  traditional  method  of  truncating  the  singular  value  decomposition  by 
zeroing  some  of  the  singular  values  amounts  to  altering  a well  known  quantity, 
A,  in  order  to  compensate  for  uncertainties  in  an  unknown  quantity  b*  = Ax*. 
In  the  process  it  reassigns  the  rank  of  the  matrix,  and  since  the  truncation 
level  depends  crucially  on  the  size  of  the  errors  in  the  right  hand  side  vector, 
it  follows  that  the  altered  rank  depends  on  the  size  of  those  errors.  As  an 
example,  consider  Figure  11  which  is  a plot  of  the  cq  and  |UTb|j  for  a problem 
obtained  from  the  one  illustrated  in  Figure  3 by  increasing  all  the  errors  by 
a factor  of  10.  Both  figures  display  the  flat  segment  of  the  |U7b|  curve, 
which  corresponds  to  error  saturation,  but  for  the  problem  with  the  larger 
errors,  this  flat  segment  intersects  the  singular  value  distribution  at  a point 
higher  and  to  the  left  of  that  for  the  problem  with  the  smaller  errors.  Thus,  if 
the  truncation  required  to  stabilize  the  estimate  were  accomplished  by  zeroing 
singular  values,  the  problem  with  larger  errors  would  assign  a smaller  numerical 
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Singular  Values  and  Vector  |UTb 


Figure  11:  Singular  values  (squares)  and  first  n elements  of  | U 7 b | (circles) 
obtained  by  increasing  the  errors  in  the  modified  Phillips  problem  by  a factor 
of  10 

rank  to  the  matrix  than  the  one  with  smaller  errors.  Although  it  seems  absurd 
to  have  the  rank  of  a matrix  depend  on  the  right  hand  side  vector,  precisely 
this  suggestion  was  made  quite  recently  by  P.C.  Hansen  [12,  Chapt.  7]  who 
defined  an  “effective  numerical  rank”  which  is  not  even  an  integer.  A better 
approach  might  be  to  simply  abandon  the  notion  of  “numerical  rank” , but  the 
huge  vested  interest  in  rank  determination  and  rank  revealing  decompositions 
will  probably  preclude  such  a course  of  action. 

The  truncation  procedure  described  in  this  paper  depends  critically  on 
the  availability  of  good  estimates  for  the  uncertainties  in  the  elements  of  the 
measured  vector.  A good  experimenter  will  usually  provide  such  estimates, 
but  it  is  not  uncommon  for  them  to  be  systematically  too  small  or,  more 
rarely,  too  large.  Often  the  form  of  the  functional  dependence  of  the  errors 
on  the  magnitudes  of  the  measurements  is  correct  but  the  estimated  standard 
deviations  are  all  too  small,  or  too  large,  by  a constant  factor.  In  such  cases, 
the  diagnostics  introduced  in  Section  4 can  be  used  to  pinpoint  the  discrepancy 
and  rectify  it.  The  usual  symptom  is  that  the  threshold  r at  which  the  estimate 
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begins  to  be  stabilized  is  quite  different  from  3 or  4 and  also  different  from  the 
value  which  gives  residuals  closest  to  white  noise,  as  measured  by  the  length  of 
the  cumulative  periodogram.  In  such  cases  it  is  a straightforward  though  often 
tedious  matter  to  adjust  the  scale  of  the  estimated  errors  by  a constant  factor 
which  brings  the  two  values  of  r into  agreement  somewhere  in  the  interval 
3 < r < 5. 

The  situation  is  more  difficult  when  no  estimates  are  given  for  the  mea- 
surement errors.  The  usual  approach  is  to  implicitly  assume  that  S2  = s2Im 
where  s is  an  unknown  constant  which  can  be  estimated  from  the  least  squares 
sum  of  squared  residuals.  Of  course  the  latter  quantity  is  usually  much  smaller 
than  the  expected  value  ra,  but  the  size  of  s can  then  be  adjusted  upward  to  a 
value  such  that  some  truncation  level  in  the  range  3.0  < r < 5.0  gives  residuals 
close  to  white  noise  and  a sum  of  squared  residuals  close  to  the  expected  value. 
This  usually  iterative  technique  works  well  if  the  standard  deviations  of  the 
measured  errors  are  truly  constant.  If  not,  it  may  still  be  possible  to  assume  a 
functional  form  for  the  uncertainties  which  will  determine  the  matrix  S2  up  to 
an  unknown  scaling  constant  which  can  then  be  adjusted  for  consistency  in  the 
r values.  The  disadvantage  in  this  approach  is  the  amount  of  work  involved 
in  validating  a functional  form  and  adjusting  the  scaling,  but  sometimes  the 
results  obtained  justify  the  effort. 

A number  of  people  have  suggested  that  the  truncation  procedure  advo- 
cated in  this  paper  is  incomplete  because  it  has  not  been  accompanied  by  an 
algorithm  for  automatically  determining  the  optimum  r,  using  some  criterion 
like  those  used  in  choosing  the  smoothing  parameter  for  regularization  meth- 
ods, e.g.,  generalized  cross-validation,  or  the  L-curve  method.  In  practice,  for 
any  given  problem,  only  a few  discrete  truncations  give  estimates  with  resid- 
uals that  have  both  a reasonable  sum  of  squares  and  an  acceptable  spectral 
distribution,  and  it  is  often  the  case  that  the  two  criteria  are  optimized  by 
different  truncation  levels.  Thus  some  judgement  is  required  to  choose  which 
truncation  level  to  use.  Also  there  is  little  to  be  gained  computationally  by 
confining  the  calculations  to  a single  r.  The  amount  of  work  required  is  dom- 
inated by  the  computation  of  the  singular  value  decomposition,  but  once  that 
is  accomplished  it  is  relatively  easy  to  isolate  the  few  possibly  acceptable  trun- 
cation levels  and,  using  a fast  Fourier  transform  algorithm,  to  calculate  all  of 
the  diagnostics.  Since  the  total  effort  involved  is  small  in  comparison  to  that 
required  to  make  the  measurements,  most  users  with  real  world  problems  will 
not  be  unduly  troubled  by  having  to  make  the  final  choice  of  the  optimum 
value.  Of  course  the  procedure  can  be  automated  by  calculating  the  solutions 
and  diagnostics  on  a suitably  chosen  mesh  for  r.  If  the  mesh  spacing  is  fine 
enough  to  find  all  of  the  possibly  acceptable  solutions,  it  will  produce  many 
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duplicate  solutions,  but  these  will  not  seriously  complicate  the  final  choice  of 
the  parameter  value.  Users  of  regularization  methods  might  also  be  well  ad- 
vised to  consider  the  same  two  residual  diagnostics  in  evaluating  their  favorite 
criteria  for  choosing  the  regularization  parameter. 
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Appendices 


A 


Percentiles  for  the  x2(m) 


Distribution 


The  cumulative  distribution  funtion  for  y2(m)  is  defined  by 

F(*2)  = f ^r%)^exp('t)dz’  (jU) 

and  for  any  probability  a , the  er-point  of  the  distribution  can  be  found  by 
solving  F(\2)  — a.  If  the  solution  is  y2 , then 


Pr{||b-  Ax'||2  <x„}=a,  (A. 2) 

and 

Pr  { ||b  - Ax*||2  > y2}  = 1 — a • (A. 3) 

Standard  tables  (e.g.,  [3])  of  the  ^’-distribution  list  y2  for  various  values  of 
a and  m,  up  to  about  ra  = 30.  For  m > 30,  the  quantity  (>/2x2  — \/2m  — 1 ) 
is  distributed  approximately  like  the  standard  normal  distribution,  so  y2  is 
well  approximated  by 


X 


2 

a 


1 

2 


f,Q  + y/2  m — 1 


(A. 4) 


where  kq  is  the  a-point  of  the  standard  normal  distribution,  i.e. , 


/ 


Ka  l 


\[2/k 


exp  ( ) dz  = a . 


(A. 5) 
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Table  3:  Selected  percentiles  for  the  standard  normal  and  x2  distributions 


a 

Ka 

Xa 

m = 100  m = 150  m — 200  m = 300 

0.01 

-2.326 

69.4 

112.0 

155.7 

245.3 

0.05 

-1.645 

77.7 

122.4 

168.0 

260.6 

0.33 

-0.440 

93.4 

142.0 

190.8 

288.8 

0.50 

0.0 

99.5 

149.5 

199.5 

299.5 

0.67 

0.440 

105.8 

157.2 

208.4 

310.4 

0.95 

1.645 

124.1 

179.3 

233.7 

341.1 

0.99 

2.326 

135.0 

192.4 

248.7 

359.1 

Table  3 gives,  for  various  values  of  a,  the  corresponding  values  of  kq  and,  for 
some  representative  values  of  m,  the  estimates  of  Xq  calculated  by  Eq.  (A. 4). 
Good  public  domain  software  for  computing  xi  can  be  found  in  the  GAMS 
collection  [7,  Class  L5alc]. 


B A Variant  of  the  Phillips  Problem 


A useful  test  problem  which  shares  many  of  the  characteristics  of  real  in- 
strument correction  problems  is  obtained  by  discretizing  a variant  of  the  well 
known  [16]  Phillips  equation.  This  modified  Phillips  problem  can  be  written 


y(t)  = /3  K(t,€)x(€)d£  , —6  < t < 6 , 

with  the  kernel  given  by 

§{l  + cos  [§(£-*)]}  , |£  — *|  < 3 , |*|  <6, 
0 , otherwise  , 

and  the  exact  solution  by 


K(t,0  = 


(B.l) 


(B.2) 


where 


*(£)  = P(€)  + 

k= 1 

5 

(B-3) 

= Mo  [l  + COS  ( J?)]  , 

l£l<3, 

(B.4) 

l 0 

lfl>3, 

26 


and 


c*(0  = 


Ak  {1  + cos[2tt(£  - V'fc)]}  , 

0 , otherwise  , 

with  amplitude  constants  Ak  and  centering  constants  ipk  chosen  to  be 


.Ao  = 0.1  , Ai  = 0.5  , A2  = 0.5  , A3  = 1.0  , 
i>i  = -1.5  , ^2  = 0.5,  ^3  = 1.5. 


(B.5) 


(B.6) 


The  kernel  differs  from  the  Phillips  original  only  in  the  inclusion  of  the  nor- 
malizing factor  jjr  which  was  added  to  assure  that  for  any  £, 

/■3+f 

/ K(t,£)dt  = 1 . (B.7) 

•/-3-k 

For  a measuring  instrument  this  condition  assures  that  conservation  laws  are 
not  violated.  Plots  of  K(t,£)  for  5 representative  values  of  £ are  given  on  the 
left  in  Figure  1. 

The  exact  solution  to  the  original  Phillips  problem  appears  in  a scaled  down 
form  as  the  /?(£)  term  in  the  solution  to  the  new  problem.  The  scaling  constant 
Aq  was  chosen  to  reduce  the  original  solution  to  the  role  of  a background 
function  on  which  to  superimpose  the  three  discrete  peaks  represented  by  the 
Cfc(£)  terms.  The  three  points  £ = ipk  are  the  centers  of  these  peaks  and 
the  constants  2 Ak  are  their  heights  above  the  background.  The  new  solution 
function  is  plotted  as  a dashed  line  on  the  right  in  Figure  1. 

These  changes  in  the  Phillips  problem  were  designed  to  make  it  more  chal- 
lenging and  more  reminiscent  of  real-world  instrument  correction  problems. 
Unfortunately  they  also  make  the  representation  of  the  function  y(t)  more 
complicated.  Substituting  (B.3)  into  (B.l)  gives 

3 3 3 

y{t)  = [ K{t,t)p{£)d£  + ^r  I K(t,£)ck(€)d£  , -6  < t < 6 , (B.8) 

J-3  k=lJ~ 3 

but  care  must  be  exercised  in  evaluating  these  integrals  because  K(t , £)  = 0 
on  half  of  the  rectangular  domain  { (U  £)  | — 6 < t < 6 , — 3 < £ < 3 } and 
each  of  the  ck(C)  is  zero  everywhere  except  on  the  interval  'ipk  — \ <£  < 'lPk  + \- 
The  last  equation  can  also  be  written 

y(t)  = B(t)  + YJCk(t),  (B.9) 

fc=l 

where 

B(t)  = , (B.10) 
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and 


ripk  + k 
'■*Pk 

Evaluating  the  integral  for  B(t)  gives 


Ck(t)  = f*"^  K(t,i)c„(()dS  , k = 1.  2,  3 . (B.ll) 


B(t)  = i.4„  { (6  - |t|) 


1 / 7T  ^ 

1 + 2C°Sl3t 


and  the  integrals  for  Ck{t)  can  be  written 

Ck{t)  = - AkLk(t ) , 
o 


+ ^sin(Ilfl)}  ’ 

— 6 < t < 6 , 

(B.  12) 

fc  = 1,  2,  3 , 

(B.  13) 

where 


0, 


Li{t)  = 


— 6 < t < —5, 


t + 5 4-  ~ sin  [tt(2 t + 9)]  + | sin  f (t  + 2) 
+ifc  {sin  k(2^  + 8)]  - sin  [f  (t  - 1)]  } 
+ife  {sin  [tt(2*  + 10)]  + sin  [f  (t  + 5)  } , 

1 + f {—  sin  f(l-ht)  + sin  |(2  + t)  | 


-5  < t < -4, 


' IOtt 

[sin 

+ 

ob 

+ sin 

f1  - o; 

+ -3- 

' 14t r 

[sin 

1(2-0 

+ sin 

f (5  + 1) 

(B-14) 


-4  < t < 1, 


2 - t + sin  [tt(3  - 2t)]  - J sin  If  (t  + 1) 


+ -3-  1 

' 107T  1 

sin 

>(*  + < 

+ -3-1 

' 147T  1 

sin 

w|4 

4o 

1 

Ob 

i 1 

+ sin  [7t(2  — 2t)]| 
+ sin  [7t(4  - 2t)]| , 


0, 


1 < t < 2, 

2 < t < 6, 
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L2(t)  = 


0, 


t -f-  3 -f-  ^ sin  [n(2t  H-  5)]  -t-  ^ sin 


ft 


3 

107T 

3 

147T 


jsin  [n(2t  + 4)]  — sin 
| sin  [7r(2t  + 6)]  + sin 


f(t“3) 

f(t  + 3)' 


1 + 

£ {sin  [f  (1  - t 

>] 

+ 

— ] 
107T 

sin 

f(2  + <)' 

+ 

-3-  J 

147T 

sin 

1(4- 

t) 

4 - 

t + 

isin[7r(7 

+ 

1 ^ 
co  o 

1 '—■( 

sin 

f(t  + 2) 

+ 

— 

147T 

sin 

1(4- 

t) 

+ sin 
+ sin 


f(3  -t) 
f (3  + t) 


}• 


Sin  [7T 
sin  \i r 


(2t-6)]} 
(2t  — 8)]}  , 


and 


L3(t)  = 


0, 

0, 


t + 2 + A-  sin  [ir(2t  + 3)]  + £ sin  [f  (t  - 1)] 


+ ifc  {sin  [ir(2t  + 2)]  - sin 
+ {sin  M2*  + 4)]  + sin 


f(t"4) 

f(t  + 2)' 


} 

}. 


l + i {sin  [|(2- t)]-sin  [!(!_*)]} 

+T5J  {sin  jK1  +*) 

+ife  jsin  [k5  - *) 


sm 
— sin 
+ sin 


S(t“4) 


} 


-6  <t<  -3, 


3 < t < -2, 


-2  < t < 3, 


3 < t < 4, 


4 < t < 6, 

-6  < t < -2, 


-2  < t < -1, 


-1  < t < 4, 


(B.  15) 


(B.  16) 


5 - t - A-  sin  [n(2t  - 9)]  + £ sin  [f  (2  - i) 

+ ifc  {sin  [f  (t  + !)]  - sin  M2*  - 8)]} 

+ife  {-sin  [f  (t  - 5)]  - sin  [tt(2 t - 10)]}  , 4 < t < 5, 

0 , 5 < t < 6, 

The  function  y(t)  is  plotted  as  the  solid  curve  on  the  right  in  Figure  1.  All  of 
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the  details  of  the  3 peaks  are  so  smeared  together  by  the  convolution  with  the 
kernel  function  that  that  there  is  no  hint  of  any  structure  in  the  underlying 

rr(0- 


C Discretizing  the  Modified  Phillips  Problem 


The  modified  Phillips  problem  described  in  Appendix  B was  discretized  by 
choosing  m = 300  equally  spaced  mesh  points  on  the  interval  —5.9625  < t < 
5.9625  to  give 

y(U)  = i = 1,2, ...,300 , (C.l) 

and  by  replacing  each  of  the  integrals  by  an  n = 241  point  trapezoidal  quadra- 
ture sum,  i.e., 


,3  241 

/ K(ti,€)fa(g)d€  « i = 1,..  .,300  , (C.2) 

J~3  j=i 


where 


( oil , cj2  , o;3  , . . . , oi240  , <^241  )T  — — x ( 1 , 2,  2,  ...,  2,  1)T.  (C.3) 

Defining  the  n-vector  x*  and  the  m-vector  y*  by 


x*  = xfe)  , j = 1 , 2,  . . . , 241  , 

y*  = y(ti)  , 2 = 1,  2,  ...  , 300, 

and  the  m x n matrix  K by 

Kij  = Uj K ( U , Cj)  , 2 = 1,  2 , . . . , 300  , 

, j = 1 , 2 , . . . , 241  , 


(C.l) 


(C.5) 


gives 


y*  = Kx*  + 5 , 


(C.6) 


where  <5  is  an  m-vector  of  quadrature  errors.  A crucial  assumption  in  replacing 
the  intregrals  with  quadrature  sums  is  that  the  value  of  n is  chosen  large  enough 
so  that  the  Si  are  small  relative  to  the  random  measuring  errors  et.  To  assure 
that  this  assumption  was  satisfied  for  the  test  problem,  the  elements  of  the 
vector  y*  were  computed  from  the  matrix-vector  product 


y*  = Kx* 


(C.7) 
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rather  than  from  Eqs.  (B.9)  - (B.16).  More  precisely,  the  matrix  elements 
Kij  were  computed  from  Eqs.  (B.2),  (C.3),  (C.5),  the  vector  elements  x*  were 
computed  from  Eqs.  (B.3)  - (B.6),  and  the  vector  y*  was  then  computed  from 
Eq.  (C.7).  The  “measured”  vector  y was  then  obtained  by  adding  random 
perturbations  to  the  elements  of  this  y*.  Each  perturbation  was  chosen  inde- 
pendently from  a normal  distribution  with  mean  zero  and  standard  deviation 
Si  = (lO-5)^,  so  the  variance  matrix  was 

S-  = diag(si,  si,  . . . , 4xj)  , s,  = (KT5)^  . (C.8) 


D The  Cumulative  Periodogram 


For  a given  time  series,  a periodogram  is  an  estimate  of  the  power  spectrum  of 
the  series,  i.e.,  an  estimate  of  how  the  total  variance  in  the  series  is  distributed 
in  frequency.  Such  an  estimate  can  be  obtained  at  any  desired  number  of  fre- 
quencies fk,  k = 1,  2,  • • • , N,  in  the  interval  0 < / < A,  where  T is  the  sample 
spacing  for  the  time  variable,  and  N is  chosen  to  be  greater  than  or  equal  to 
the  number  of  sample  points  in  the  time  series.  If  the  time  series  is  taken  to 
be  the  residuals  r*  for  an  estimated  solution  to  an  ill-posed  problem,  with  the 
element  number  i as  the  time  variable,  then  T — 1,  and  the  periodogram  is 
estimated  on  an  iV-point  equally  spaced  mesh  on  the  interval  0 < / < 0.5.  It 
is  obtained  by  zero-padding  the  r*  series  to  have  N > m terms  and  computing 
the  discrete  Fourier  transform 

N ( ki\  N 

'R'k=T'^2  r j exp  i 27r  ~J  , k = 0,1,2,...,—  , (D.l) 


where  i --  y/^l  and 


rj  = 


(A x)j  , j = 1,2,..., m , 

, j = m + 1,  m + 2, . . . , N . 


(D.2) 


The  zero-padding  increases  the  density  of  the  frequency  mesh,  but  does  not 
change  the  value  of  the  transform  at  any  given  frequency.  Each  lZk  is  associ- 
tated  with  the  corresponding  Fourier  frequency  fk  = The  periodogram  is 
computed  from  the  transform  by 

1 N 

Vk  = P(fk)  = jjf  \Kk\\  * = 0,1,2,...,  (D.3) 

Peaks  in  the  plot  of  Vk  versus  fk  indicate  the  presence  of  sinusoidal  cycles  in 
the  parent  time  series,  and  the  height  of  each  such  peak  gives  an  estimate  of 
the  power  of  that  cycle. 
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The  cumulative  peridogram  is  defined  by 


Co  = C(0)  = 0 , 


ck  = c(A)  = i £ Vj 

^ 3=1 


(D.4) 


where 


N/2 


s = Y.-Pj- 

3=1 


(D.5) 


Clearly  C(f ) is  a monotonic  nondecreasing  function  of  frequency  ranging  be- 
tween the  values  C(0)  =0  and  C(1/2T)  = 1.  The  relationship  between  the 
periodogram  and  the  cumulative  periodogram  is  illustrated  in  Figures  12  and 
13.  In  the  first  case  the  time  series  is  comprised  of  m = 128  independent 
random  samples  Xi  from  a standard  normal  distribution  (mean  = 0,  variance 
= 1),  and  in  the  second  case  the  series  is  given  by 


/ 27TZ  \ 

Xi  = sml—J+ni,  i = 1,2,  ...,128  , (D.6) 

where  the  rq  are  independent  samples  from  a zero-mean  normal  distribution 
with  variance  = 0.1.  Thus  the  first  series  is  a realization  of  a pure  white  noise 
signal  and  the  second  is  a single  sinusoid,  with  unit  amplitude  and  period 
10,  corrupted  by  zero-mean  white  noise  with  standard  deviation  0.316.  In 
both  cases  the  time  series  was  first  detrended  by  subtracting  out  the  mean 
value  which  was  approximately,  but  not  exactly,  zero.  The  periodogram  for 
the  white  noise  case  has  the  power  distributed  more  or  less  uniformly  on  the 
interval  [0,  0.5],  so  the  cumulative  periodgram  does  not  depart  too  much  from 
a diagonal  line  that  would  connect  the  points  (0,0)  and  (0.5, 1),  i.e.,  the  line 


C(f)  = 277  , 


(D.7) 


which  is  the  theoretical  distribution  for  pure  white  noise.  One  way  to  quantize 
departures  from  white  noise  is  to  compare  the  length  of  that  line, 


\J (0.5)2  + (1.0)2  = 1.11803  , 
with  that  of  the  computed  estimate, 


N/2 

Length  (C)  = ^ 
k= i L 


Ck  — Ck-iY  + {fk  — fk-iY 


(D.8) 


which  is  given  in  the  legend  box  on  the  plot  of  the  cumulative  periodogram. 
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Figure  12:  Periodogram  and  cumulative  periodogram  for  a white  noise  time 
series 
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Figure  13:  Periodogram  and  cumulative  periodogram  for  a single  sinusoid  plus 
white  noise  time  series 
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The  periodogram  for  the  sinusoid  plus  noise  time  series  is  dominated  by 
a single  peak  at  frequency  0.1  (period  = 10)  which  produces  a wide  devia- 
tion from  the  white  noise  diagonal  at  the  same  frequency  in  the  cumulative 
periodogram.  One  way  to  test  whether  of  not  such  a deviation  represents 
a statistically  significant  departure  from  white  noise  is  to  construct  the  two 
parallel  off-diagonal  lines  defined  by 

C(f)  = ±S  + 2Tf,  (D.9) 

where  5 is  the  5%  point  of  the  Kolmogorov-Smirnov  statistic  for  a sample  of 
size  m/2.  These  two  lines,  which  are  plotted  in  both  Figures  12  and  13,  enclose 
a 95%  confidence  band  which  can  be  used  to  test  the  hypothesis  that  the  time 
series  is  a realization  of  white  noise.  The  cumulative  periodogram  ordinates 
for  such  a series  should  lie  outside  this  band  for  at  most  5%  of  the  frequencies. 
None  of  the  ordinates  lie  outside  the  band  for  the  white  noise  case,  but  more 
than  50%  of  them  do  for  the  sinusoid  plus  noise  case. 
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